Search CORE

81 research outputs found

Cepstral trajectories in linguistic units for text-independent speaker recognition

Author: D.A. Reynolds
E. Shriberg
N. Dehak
P. Kenny
T. Kinnunen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-35292-8_3Proceedings of IberSPEECH, held in Madrid (Spain) on 2012.In this paper, the contributions of different linguistic units to the speaker recognition task are explored by means of temporal trajectories of their MFCC features. Inspired by successful work in forensic speaker identification, we extend the approach based on temporal contours of formant frequencies in linguistic units to design a fully automatic system that puts together both forensic and automatic speaker recognition worlds. The combination of MFCC features and unit-dependent trajectories provides a powerful tool to extract individualizing information. At a fine-grained level, we provide a calibrated likelihood ratio per linguistic unit under analysis (extremely useful in applications such as forensics), and at a coarse-grained level, we combine the individual contributions of the different units to obtain a highly discriminative single system. This approach has been tested with NIST SRE 2006 datasets and protocols, consisting of 9,720 trials from 219 male speakers for the 1side-1side English-only task, and development data being extracted from 367 male speakers from 1,808 conversations from NIST SRE 2004 and 2005 datasetsSupported by MEC grant PR-2010-123, MICINN project TEC09-14179, ForBayes project CCG10-UAM/TIC-5792 and Cátedra UAM-Telefónica

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Recommended from our members

Unimodal late fusion for NIST i-vector challenge on speaker detection

Author: Artur S. d'Avila Garcez
Dass S.C.
Dehak N.
Gunes H.
Hazrat Ali
Khalid Iqbal
Lip C.C.
Snoek C.G.M.
Son N. Tran
Xianwei Zhou
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 17/07/2014
Field of study

Speaker detection is a very interesting machine learning task for which the latest i-vector challenge has been coordinated by the National Institute of Standards and Technology (NIST). A simple late fusion approach for the speaker detection task on the i-vector challenge is presented. The approach is based on the late fusion of scores from the cosine distance method (the baseline) and the scores obtained from linear discriminant analysis. The results show that by adapting the simple late fusion approach, the framework can outperform the baseline score for the decision cost function on the NIST i-vector machine learning challenge

City Research Online

Crossref

Improved i-Vector Representation for Speaker Diarization

Author: G Hinton
I McLoughlin
Ian McLoughlin
Kui Wu
N Dehak
P Kenny
S Tranter
Y Song
Yan Song
Yan Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

This paper proposes using a previously well-trained deep neural network (DNN) to enhance the i-vector representation used for speaker diarization. In effect, we replace the Gaussian Mixture Model (GMM) typically used to train a Universal Background Model (UBM), with a DNN that has been trained using a different large scale dataset. To train the T-matrix we use a supervised UBM obtained from the DNN using filterbank input features to calculate the posterior information, and then MFCC features to train the UBM instead of a traditional unsupervised UBM derived from single features. Next we jointly use DNN and MFCC features to calculate the zeroth and first order Baum-Welch statistics for training an extractor from which we obtain the i-vector. The system will be shown to achieve a significant improvement on the NIST 2008 speaker recognition evaluation (SRE) telephone data task compared to state-of-the-art approaches

Crossref

Springer - Publisher Connector

Kent Academic Repository

Sesquiterpenes from aerial parts of Ferula vesceritensis

Author: C. Bayet
D. Guilet
K. Oughlissi-Dehak
M. Hadj-Mahammed
M.G. Dijoux-Franca
N. Darbour
P. Lawton
S. Michalet
Y.A. Badjah-Hadj-Ahmed
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

From the dichloromethane extract of aerial parts of Ferula vesceritensis (Apiaceae), 11 sesquiterpene derivatives were isolated. Among them five were compounds designated as 10-hydroxylancerodiol-6-anisate, 2,10-diacetyl-8-hydroxyferutriol-6-anisate, 10-hydroxylancerodiol-6-benzoate, vesceritenone and epoxy-vesceritenol. The six known compounds were identified as feselol, farnesiferol A, lapidol, 2-acetyl-jaeschkeanadiol-6-anisate, lasidiol-10-anisate and 10-oxo-jaesckeanadiol-6-anisate. All the structures were determined by extensive spectroscopic studies including 1D and 2D NMR experiments and mass spectroscopy analysis. Two of the compounds, the sesquiterpene coumarins farnesiferol A and feselol, bound to the model recombinant nucleotide-binding site of an MDR-like efflux pump from the enteropathogenic protozoan Cryptosporidium parvum

HAL Descartes

Okina

Recommended from our members

Speaker recognition with hybrid features from a deep belief network

Author: AR Mohamed
Artur S. d’Avila Garcez
C Burges
Emmanouil Benetos
F Richardson
GE Hinton
GE Hinton
H Ali
H Ali
H Lee
Hazrat Ali
L Deng
N Dehak
N Roux Le
Son N. Tran
T Kinnunen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/08/2016
Field of study

Learning representation from audio data has shown advantages over the handcrafted features such as mel-frequency cepstral coefficients (MFCCs) in many audio applications. In most of the representation learning approaches, the connectionist systems have been used to learn and extract latent features from the fixed length data. In this paper, we propose an approach to combine the learned features and the MFCC features for speaker recognition task, which can be applied to audio scripts of different lengths. In particular, we study the use of features from different levels of deep belief network for quantizing the audio data into vectors of audio word counts. These vectors represent the audio scripts of different lengths that make them easier to train a classifier. We show in the experiment that the audio word count vectors generated from mixture of DBN features at different layers give better performance than the MFCC features. We also can achieve further improvement by combining the audio word count vector and the MFCC features

City Research Online

Crossref

University of Tasmania Open Access Repository

Queen Mary Research Online

Automatic Smoker Detection from Telephone Speech Signals

Author: AH Poorjam
CM Bishop
D Sorensen
G Dobry
J Gonzalez
JR Deller
M Zweig
MH Bahari
N Dehak
P Kenny
R Yager
W Campbell
X Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/08/2017
Field of study

Crossref

VBN

NeuroSpeech

Author: Arora R.
Bocklet T.
Cernak M.
Chinaei H.
Christensen H.
Dehak N.
Hannink J.
Nidadavolu P.S.
Nöth E.
Orozco-Arroyave J.R.
Rudzicz F.
Vann A.
Vargas-Bonilla J.F.
Vogler N.
Vásquez-Correa J.C.
Yancheva M.
Publication venue: 'Elsevier BV'
Publication date: 01/07/2018
Field of study

NeuroSpeech is a software for modeling pathological speech signals considering different speech dimensions: phonation, articulation, prosody, and intelligibility. Although it was developed to model dysarthric speech signals from Parkinson's patients, its structure allows other computer scientists or developers to include other pathologies and/or measures. Different tasks can be performed: (1) modeling of the signals considering the aforementioned speech dimensions, (2) automatic discrimination of Parkinson's vs. non-Parkinson's, and (3) prediction of the neurological state according to the Unified Parkinson's Disease Rating Scale (UPDRS) score. The prediction of the dysarthria level according to the Frenchay Dysarthria Assessment scale is also provided

Crossref

Directory of Open Access Journals

White Rose Research Online

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

Author: A Graves
A Graves
A Lozano-Diez
A rahman Mohamed
Alicia Lozano-Diez
CM Bishop
D Martinez
D Martinez
D Reynolds
D Yu
Doroteo T. Toledano
F Gers
F Richardson
F Weninger
FA Gers
FA Gers
G Hinton
H Li
Ian McLoughlin
J Gonzalez-Dominguez
J Gonzalez-Dominguez
J Schmidhuber
Javier Gonzalez-Dominguez
Joaquin Gonzalez-Rodriguez
M Van Segbroeck
N Dehak
N Dehak
P Kenny
PA Torres-Carrasquillo
Ruben Zazo
Y Song
YK Muthusamy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Zazo R, Lozano-Diez A, Gonzalez-Dominguez J, T. Toledano D, Gonzalez-Rodriguez J (2016) Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks. PLoS ONE 11(1): e0146917. doi:10.1371/journal.pone.0146917Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (similar to 3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.This work has been supported by project CMC-V2: Caracterizacion, Modelado y Compensacion de Variabilidad en la Señal de Voz (TEC2012-37585-C02-01), funded by Ministerio de Economia y Competitividad, Spain

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central

Biblos-e Archivo

Improved i-Vector Representation for Speaker Diarization

Author: G Hinton
I McLoughlin
Ian McLoughlin
Kui Wu
N Dehak
P Kenny
S Tranter
Y Song
Yan Song
Yan Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An investigation of supervector regression for forensic voice comparison on small data

International audienceThe present paper deals with an observer design for a nonlinear lateral vehicle model. The nonlinear model is represented by an exact Takagi-Sugeno (TS) model via the sector nonlinearity transformation. A proportional multiple integral observer (PMIO) based on the TS model is designed to estimate simultaneously the state vector and the unknown input (road curvature). The convergence conditions of the estimation error are expressed under LMI formulation using the Lyapunov theory which guaranties bounded error. Simulations are carried out and experimental results are provided to illustrate the proposed observer

HAL Evry

Crossref

Springer - Publisher Connector